Decomposing past and future: Integrated information decomposition based on shared probability mass exclusions

A core feature of complex systems is that the interactions between elements in the present causally constrain their own futures, and the futures of other elements as the system evolves through time. To fully model all of these interactions (between elements, as well as ensembles of elements), it is possible to decompose the total information flowing from past to future into a set of non-overlapping temporal interactions that describe all the different modes by which information can be stored, transferred, or modified. To achieve this, I propose a novel information-theoretic measure of temporal dependency (Iτsx) based on the logic of local probability mass exclusions. This integrated information decomposition can reveal emergent and higher-order interactions within the dynamics of a system, as well as refining existing measures. To demonstrate the utility of this framework, I apply the decomposition to spontaneous spiking activity recorded from dissociated neural cultures of rat cerebral cortex to show how different modes of information processing are distributed over the system. Furthermore, being a localizable analysis, Iτsx can provide insight into the computational structure of single moments. I explore the time-resolved computational structure of neuronal avalanches and find that different types of information atoms have distinct profiles over the course of an avalanche, with the majority of non-trivial information dynamics happening before the first half of the cascade is completed. These analyses allow us to move beyond the historical focus on single measures of dependency such as information transfer or information integration, and explore a panoply of different relationships between elements (and groups of elements) in complex systems.


Introduction
What does it mean for a complex system to have "structure," or even to be a "system" at all? Nature abounds with systems: almost every object, when examined closely enough, is actually a composite structure, comprised of many interacting components. The world is a dynamic congeries of complex interactions and relationships. It is those relationships that define the nature and structure of the systems of which they are a part. For a system to have "structure," its behaviour in the future must be some consequence of its behaviour in the past. When parts of the system interact, the states of individual elements, or ensembles of elements, constrain their own possible futures, the futures of those components they interact with, and ultimately, the future of the system as a whole. For example, a single neuron embedded in a neuronal network might fire at some time t − τ: that firing, constrains its own future (albeit transiently) due to subsequent hyper-polarization and the refractory period. It also informs on the possible futures of all those post-synaptic neurons to which it was coupled: the probability that they will fire changes after one of their parents fires and so on. In particular cases, the firing of a single neuron (or just a few neurons) may radically constrain the future of the entire brain (for example, if it triggers an epileptic seizure). The entire scientific endeavour is, in some sense, built on uncovering these dependencies and understanding their specifics. For a complex system X, comprised of many interacting parts, it is possible to quantify the total degree to which its future can be predicted based on its past with the excess entropy [1]: Where X −1:t corresponds to the joint state of every element in X, at every time t, from the first moment up to time t. The second term, X t:1 , indicates the joint state of every element at every time from t to the infinite future (I adopt the Python-like notation from [2]). Accounting for extended periods of past and future can reveal dependencies of varying durations (e.g. distance-related delays in communication networks), however, in practice, there are practical problems associated with recording infinite data, so the full excess entropy is typically inaccessible. In the particular case of Markovian systems, however, the situation is considerably easier, as the excess entropy reduces to the mutual information between a moment and its immediate past (possibly incorporating a lag of −τ moments): For example, consider a two element system with Markovian dynamics: X = {X 1 , X 2 } (following [3] I use superscripts to denote indexes and subscripts to denote time). We can compute the lag-τ excess entropy of X as a whole as: The excess entropy is an extremely coarse measure, aggregating all of the temporal statistical dependencies, at every scale, within a multivariate system into a single number. For a more "complete" understanding of the dependencies within a system, it would be useful to be able to decompose it into non-overlapping components that describe how particular elements (and ensembles of elements) constrain each other as the system evolves through time: for example, how does the state of X 1 at time t − τ constrain its own future? How does it constrain the future of X 2 ? Other, more exotic dependencies are also possible: for example, the joint state of X 1 À t and X 2 À t together may constrain the future of just X 1 t (a phenomena sometimes referred to as "downward causation", which has been the subject of intense philosophical debate [4,5]). There may be information about the future of X 1 t that is redundantly disclosed by observing either X 1 À t alone or X 2 À t alone, and so on. How can all of these different dependencies be untangled?
One possible path forward comes from the field of information decomposition. Classically, information decomposition concerns itself with the question of how to best understand how different ensembles of predictor variables collectively disclose information about a single target variable [6,7]. Since the original introduction of the partial information decomposition (PID) framework by Williams and Beer in 2010, researchers in complex systems science, information theory, and theoretical neuroscience have collectively worked to deepen our understanding of multivariate information and higher-order dependencies. Recently, Mediano, Rosas, and other introduced a multi-target information decomposition (the integrated information decomposition (FID) [3,8,9] which extends the original PID framework to multiple targets, enabling a full decomposition of the excess entropy into non-overlapping, "atomic" components (which I will refer to as FI, or integrated information, atoms). Despite being a considerable leap forward in our understanding of multivariate, temporal information, like the original PID, the FID lacks a crucial element required for applications to real data: an operational definition of multivariate redundancy.
In this work, I propose such a redundancy function, termed I τsx . Based on a recent singletarget measure introduced by Makkeh et al., [10] our proposed measure generalizes the classic Shannon mutual information function to ensembles of multiple interacting elements that may redundantly disclose information about each-other. I begin by reviewing the classic, single-target PID, before generalizing to the FID. I then introduce the I τsx measure, and demonstrate its application in three constructed Markovian systems designed to display distinct dynamical differences, and finally empirical, neuronal spiking data recorded from dissociated cultures of mouse hippocampal cortex [11,12]. I conclude by discussing the strengths and limitations of our measure, and the FID framework itself.

Partial information decomposition 1.Intuition & the bivariate case.
Consider the the simple case with two predictor variables (X 1 and X 2 ) that jointly disclose information about a target variable Y. Basic information theory gives us the tools to asses how each X i individually informs on Y (the marginal mutual informations, e.g. I(X i ; Y)), and how the joint state of both X 1 and X 2 together inform on Y: I (X 1 , X 2 ; Y). The relationship between the marginal and joint mutual informations is not always straight-forward, however: the sum of both marginal mutual informations can be greater than or less than the joint mutual information in various contexts. If I(X 1 ;Y) + I(X 2 ; Y) > I(X 1 , X 2 ; Y), then there must be some information about Y that is redundantly present in both X 1 and X 2 individually, and so when the two marginal mutual informations are summed, that redundant information is "double counted." Conversely, if I(X 1 ; Y) + I(X 2 ; Y) < I(X 1 , X 2 ; Y), then there is information about Y in the joint state of X 1 and X 2 that is only accessible when the two are considered together and not accessible by looking at any individual X. These comparisons of "wholes" to "parts" are only rough heuristics, however, as redundant and synergistic information can co-exist in a set of predictor variables [6]: the direction of the inequality only indicates whether synergistic or redundant information dominates the interaction.
The seminal contribution of Williams and Beer was to provide a mathematical framework that allowed for a complete decomposition of the joint mutual information into non-overlapping, additive "atoms" of information: where Red(X 1 , X 2 ; Y) is the redundant information about Y that could be learned by observing either X 1 or X 2 individually, Unq(X 1 ; Y/X 2 ) is the information about Y that is uniquely disclosed by X 1 (in the context of X 2 , a vice versa for the other unique atom), and Syn(X 1 , X 2 ; Y) is the synergistic information about Y that can only be learned by observing X 1 and X 2 simultaneously. Furthermore, the "marginal mutual informations" can be broken down into the same atomic components: The result (in the case of two predictor variables) is an under-determined system with three known values (the three mutual information terms) and four unknown values (each of the partial-information atoms). If any one atom can be determined, then the remaining three are resolved "for free." Classical information theory does not provide any specific functions for any of these terms [13], and consequently their development is an area of active, and on-going, research. It is most common to begin by defining a redundancy function [6], although approaches based on defining unique [14,15] and synergistic information [16,17] have also been proposed. Unfortunately, if the number of sources is greater than two, the resulting decompositions of the joint and marginal mutual informations are not so constrained and more advanced mathematical machinery is required to decompose the joint mutual information.

The partial information lattice & möbius inversion.
For a collection of N predictor variables X = {X 1 , . . ., X N } jointly informing on a single target Y, we are interested in understanding how every X i 2 X (and ensembles of Xs joint by the logical conjunction) disclose information about the target. This requires understanding all the ways that the elements of X can redundantly, uniquely, and synergistically share information. Williams and Beer showed that, given an measure of redundant (or shared) information between some collection of sources and the target (denoted I \ (�; Y) here), the "atomic" components of the joint mutual information are constrained into a partially ordered set called the partial information lattice. The derivation of the lattice will be briefly described below, but see Gutknecht et al., for a more complete discussion [7].
We begin by defining the set of sources that may disclose information about Y. This is given by the set of all subsets of X (excluding the empty set, denoted as P 1 ðXÞ). Every (potentially multivariate) source can be thought of as an aggregated macro-variable, whose state is defined by the logical-AND operator over all of its constituent elements. For example, if our predictor variables are X 1 , X 2 and X 3 , then the collections of sources are: For some (potentially overlapping) collection of sources, A 1 , . . ., A k , the redundancy function I \ (A 1 , . . ., A k ; Y) quantifies the information about Y that can be learned by observing A 1 _ . . . _A k . The domain of the I \ (�; Y) is given by the set of all collections of sources such that no source is a subset of any other: This restriction means that that A is also partially ordered: The resulting lattice hA; �; i provides the scaffolding on which the full PID may be constructed. Every α 2 A corresponds to a vertex on the lattice, and the ordering reveals a structure of increasingly synergistic information-sharing relationships. For a visualization of the partial information lattices for sets of two and three predictor variables, see Fig 1. With the structure of the partial information lattice set and our as-yet-undefined redundancy function in place I \ (�; Y), it is possible to solve the PID for every α 2 A using a Möbius inversion: By recursively defining the value of particular partial information atoms as the difference between the redundant information disclosed by a particular set of sources and the sum of all atoms lower on the lattice, the joint mutual information between an arbitrary number of predictor variables and a single target can be decomposed into non-overlapping components. Examples of partial information lattices for the two simplest possible systems of multiple sources predicting a single target. On the left is the lattice for two predictor variables, and on the right is the lattice for three predictor variables. Following the notation introduced by Williams and Beer [6], sources are denoted just by index: for example {1}{2} is the information redundantly disclosed by X 1 or X 2 , {1}{23} is the information disclosed by X 1 or (X 2 and X 3 ), etc. https://doi.org/10.1371/journal.pone.0282950.g001

Integrated information decomposition
With the basic PID defined, it is possible to do a partial examination of the excess entropy. For example, Varley and Hoel [18] decomposed the joint mutual information between all elements at time t − τ and the joint state of the whole system at time t: IðX 1 À t ; . . . X N À t ; X t Þ. This method provides insights into how the states of particular elements (and ensembles of elements) collectively constrain the future of the whole system, but provides limited insights into how parts of the system constrain each-other, as the future state is aggregated into a single "whole." To partially address this issue, one could imagine doing a PID of the information that every element of the whole system at time t − τ discloses every set of elements at time t: decomposing While potentially illuminating, this approach is still limited by the fact that it does not readily allow for notions of "redundancy" and "synergy" between target elements. For example it might be natural to ask "what information synergistically disclosed by X i À t and X j À t about X m t also applies to X n t (i.e. is redundantly "copied" over both elements). Similarly, when decomposing IðX 1 À t ; . . . X N À t ; X i t ; X j t Þ, one might want to know what information about the joint state of X i t and X j t could not be learned by decomposing IðX 1 À t ; . . . X N À t ; X i t Þ or IðX 1 À t ; . . . X N À t ; X j t Þ alone (i.e. that information that is synergistically present in the joint state of X i t and X j t together). Achieving a complete decomposition of the excess entropy requires a generalization of the PID framework to account for redundancies and synergies in both the past and future of the system under study.
To address this, Mediano et al., [3,8,9,19] recently introduced a generalization of the PID that allows the decomposition of multiple sources onto multiple targets. Called the integrated information decomposition (FID), this decomposition allows for a complete decomposition of the excess entropy.
The integrated information decomposition begins by defining a product lattice A 2 ¼ A � A (where A is the single-target redundancy lattice derived above), for which each vertex in A 2 is defined by an ordered pair α ! β, with α; β 2 A. In the case of a temporal process, α refers to a particular collection of sources observed at time t − τ that disclose information about β, a collection of sources observed at time t.
As with the single-target partial information lattice, the product lattice is a partially ordered set, with: The integrated information lattice can be similarly solved via Möbius inversion, given a suitable temporal redundancy function I α!β \ . For a visualization of the integrated information lattice for the case of two sources and two targets, see Fig 2. The FID framework deviates from the PID framework in one key way. In the original formulation by Williams and Beer, the lattice (which motivates the Möbius inversion) is derived from the axiomatic properties of the proposed redundancy measure. While there has never been universal agreement on the specific definition of "redundancy", any function that satisfies the original axioms can be shown to induce the lattice: it follows from the definition of redundancy. In contrast, in the FID framework, the double-redundancy lattice is not derived from the properties of the I α!β \ function, but rather, imposed by the product of the "marginal" PI lattices. To address this, Mediano et al., imposed a compatibility constraint on any double-redundancy function [3,8]. Given two (potentially, but not necessarily) multivariate) variables X, Y and two double-redundancy atoms α; β 2 A 2 The compatibility axiom requires that, if one of the variables (X or Y) is univariate, then the double redundancy function reduces to a classic, single-target redundancy function, and the FID reduces to the classic PID.
Mediano et al., also impose a partial-ordering criteria: if α ! β � α 0 ! β 0 , then . This ensures that the redundancy function induces the same partial ordering on atoms that the construction of the product lattice does, ensuring consistency between the scaffold and the function.
1.2.1 Interpreting FID atoms. The standard PID atoms are reasonably easy to interpret in terms of logical conjuctions and disjunctions of sources. In the case of the FID, the lefthand side of the integrated information atom remains the same (collections of sources that redundantly disclose information), but there is no longer a consistent target. Rather, there are again collections of sources that have their own redundant information sharing patterns. What, then, are they disclosing information about? I will discuss the answer in formal detail below, however, one proposed intuition is in the form of information dynamics. Information The integration information lattice. The integrated information lattice for a system X = {X 1 , X 2 }. Every vertex of the lattice corresponds to a specific "conversion of information" that information in one mode at time t − τ can be transformed into at time t. For example, {1}{2} ! {1} corresponds to information that is redundantly disclosed by X 1 and X 2 at time t − τ that is then only uniquely disclosed by X 1 at time t.
https://doi.org/10.1371/journal.pone.0282950.g002 dynamics proposes to break the different "modes" of information flow in complex systems down into discrete "types of computation" or "processing" [20]. Mediano et al. [3,8], proposed the following intuitive taxonomy of integrated information atoms on the two-element lattice: Information Storage: Information present in a particular configuration at time t − τ that remains in the same configuration at time t. In the case of the two-element system, these are: Causal Decoupling: The double-synergy term {12} ! {12} has been given particular focus as a possible formal definition of "emergent dynamics" [9,19,21], as it refers to information that is persistently present in the whole, but none of the parts.
Information Transfer: Information present in a single element that "moves" to another single element: {1} ! {2} and {2} ! {1}. Not to be confused with the transfer entropy [22], which typically involves extended histories and itself conflates unique and synergistic modes of information sharing [23]. This taxonomy has only begun to be explored (for example see [24][25][26] for intriguing results related to macro-scale brain dynamics), and a rigorous formal understanding of the relevant mathematics may help deepen our understanding of these various (and in some cases, philosophically significant) phenomena.

Shared exclusions & (temporal) redundancy
A peculiar quirk of the PID and its derivatives is that, while it reveals the "structure" of multivariate information, it doesn't provide a direct means of calculating the specific values: it assumes the existence of a well-behaved redundancy measure and builds from there. Since the initial introduction by Williams and Beer, the number of different redundancy functions has proliferated (see [10,13,[27][28][29][30][31][32][33][34][35][36]), although to date, no measure has achieved universal acceptance or satisfies every desiderata.
Being much newer, there has been less work on double redundancy functions: to date, only three have been used, most only once: a temporal minimum mutual information analysis [24,26,37], a measure based on the dependency lattice [9], and a generalization of the common change in surprisal measure [3]. While all analyses are informative, there is still room for deeper insights into the exact nature of temporal redundancy and how information conversion occurs between ensembles of variables. In this work, I generalize a recent redundancy function, the I sx measure first proposed by Makkeh et al., [10], to account for multiple targets which I term I τsx . I selected I sx as my starting point for three reasons: the first is that it illuminates an elegant connection between multivariate information sharing and formal logic, second, because it does not require arbitrary thresholds (as in the case of I ccs [33]) nor non-diffentiable min/max functions (as in I mmi [31] and the closely related I ± [34]). Third, it is localizable, returning values for every possible configuration, rather than being an expected value over the entire distribution. Below, I introduce the basics of local information theory (a key prerequisite for defining I sx ), before defining the redundancy function for single targets, and ultimately generalizing to multi-target information.

Local information theory
Thus far, I have been using the standard interpretation of mutual information as an average value over some distribution of configurations; For any specific configuration, the local mutual information is defined as: Unlike the expected mutual information, the local mutual information can be either positive or negative depending on whether P(x|y) or P(x) is the greater term. While the local mutual information is well-explored and has been previously used extensively to characterize "computation" in complex systems [20], it is only recently that a novel interpretive framework has emerged based on exclusions of probability mass. Finn and Lizier [38] showed that the sign and value of the local mutual information i(x; y) can be understood as a function of the amount of probability mass from P(X, Y) that is "ruled out" upon observing that X = x and Y = y. For a very simple example, consider a system where one player rolls a fair die and another has to guess the value. Initially, the guesser is maximally uncertain, as all six outcomes are equiprobable. However, if they learn that the the number rolled was even, then they have gained information proportional to the total probability mass of all excluded possible outcomes. Formally, the local mutual information can be re-written in terms of probability mass exclusions as: In this relationship, if y is comparatively more likely after accounting for x, then i(x; y) > 0, and if it is less likely, then the value is negative.

Single-target redundancy based on shared exclusions (I sx )
The classic, local mutual information is bivariate, quantifying the information shared between two variables. To construct a function that accounted for multiple sources redundantly disclosing information about a single target, Makkeh et al., leveraged a link between redundant information and logical implication [7,10]. Briefly, given some set of logical statements ψ 1 , . . ., ψ k , the information that is redundantly disclosed by all of them is the information learned if ψ 1 = True OR ψ 2 = True OR . . . ψ k = True. From this, they define a logical redundancy measure that induces the same lattice as the PID. The function provides a mapping between every a 2 A and a logical statement. For example, the atom {1}{2}{3} maps to the information disclosed if ψ 1 _ ψ 2 _ ψ 3 = True, and the atom {1}{2, 3} maps to the statement ψ 1 _ (ψ 2^ψ3 ) = True and so on. The application to random variables is straightforward: given some set of source variables informing on a target y, the information about y redundantly disclosed by all the sources is the information that could be learned by observing x 1 alone OR x 2 alone . . . OR x k alone. The logic extends to more complicated atoms, such as {1}{2, 3}, which is the information about y that would be learned by observing just x 1 alone OR the joint state of x 2 AND x 3 together.
As in the case of local mutual information, i sx defines "disclosing information" in terms of probability mass exclusions. For example, observing X 1 = x 1 _ (X 2 = x 2^X3 = x 3 ) changes the probability of observing Y = y. As with the local mutual information, depending on how P(y) changes, the value of i sx can be positive or negative.
More formally, consider a set of (potentially overlapping, potentially multivariate) sources a 1 , . . ., a k that collectively disclose information about a target y. The information redundantly shared between them can be defined as a function of the probability mass of P(Y) that would excluded by observing a 1 _ . . . _ a k : For the special case of only one source, it is clear that i sx (a; y) = i(a; y), which is itself just a regular joint mutual information: iðx a 1 ; . . . ; x a jaj ; yÞ. In this sense, i sx can be understood as generalizing the Shannon mutual information to account for ensembles of multiple sources that redundantly share information about y [7].
Like the standard local mutual information i sx can return both positive and negative values (corresponding to informative and misinformative probability mass exclusions respectively). These two types of exclusion can be quantified by further decomposing i sx into two components: In the context of a single-target PID, i þ sx and i À sx are provably non-negative and satisfy the original desiderata proposed by Williams and Beer. The local redundant information measures can be aggregated into expected measures over the distribution of configurations in the same way as mutual information: and likewise for the informative and misinformative functions.

Multi-target temporal redundancy based on shared exclusions (I τsx )
We now have all the required machinery to introduce our local measure of temporal information decomposition: I τsx . In the original i sx measure, the mutual information is understood as the relative increase or decrease in the probability P(Y = y) after observing the configuration of some ensemble of sources. In I τsx , the probability of the single target is replaced with the probability of observing b 1 _ . . . _ b m . It is worth considering the intuition behind this change. Suppose x = {x 1 , x 2 } and y = {y 1 , y 2 }. I am interested in what probability mass exclusions induced by x 1 OR x 2 are consistent with either y 1 OR y 2 . Said differently, what information that could be learned from either x 1 OR x 2 (i.e. is redundantly present in both of them) is true about any configuration consistent with y 1 OR y 2 . Formally: From here forward, I will denote ensembles of sources with α, β, etc, for the purposes of notational compactness.
From the definition of i τsx it is clear that it follows the compatibility criteria proposed by Mediano et al., [3,8]. When decomposing i(x, y) and |y| = 1, then Eq 20 is equivalent to Eq 15 as the union of all sources (b 1 [ . . . [ b m ) is equivalent to the single source y, and likewise for the condition where |x| = 1. This shows that i τsx is consistent with the classic PID (inducing the standard single-target redundancy lattice). In the special case of single sources (α = {a}, β = {b}), it is clear that I τsx (a; b) = i(a; b) and so that I τsx completes the generalization of local mutual information begun by i sx : I τsx is a full generalization of the mutual information to multiple sets of redundant sources and multiple sets of redundant targets.

Multi-target redundancy & entropy decomposition.
The full i α!β tsx function does not satisfy the partial ordering (monotonicity) criteria. Like i sx , it can be negative or positive, depending on the structure of the dependency between the elements. The double-redundancy can be decomposed, however, into three redundant entropy terms that are partially ordered, and consistent with the integrated information lattice. These three redundant entropy terms induce three partial entropy decompositions [39][40][41]: two marginal decompositions on the classic redundancy lattice, and a joint decomposition on the product lattice.
The double redundancy function i α!β tsx can be re-written in terms of sums and differences of union entropies. I can re-write Eq 20 in an equivalent form: For proof of equivalence, see Appendix A in S1 Appendix. For some multivariate x, recall that i(x; x) = h(x). The joint entropy h(x) can be decomposed by assessing how different combinations of parts (i.e. all x i 2 x) redundantly and synergistically disclose information about the whole [10,41]. The single-target i sx function can decompose i (x 1 , . . ., x k ; x) and will find that it is equal to i + (x 1 , . . ., x k ; x). There is no misinformative component (for proof, see Appendix B in S1 Appendix).
Intuitively, the redundant entropy function can be understood as quantifying how much uncertainty about x is resolved by learning x 1 _ . . . _x k . This function was recently explored in detail by Varley et al., [41] and denoted as h sx after [10]: Which has been previously shown to satisfy the relevant Williams and Beer axioms locally [10]. I can then re-write Eq 20 as: This framing provides a different, but complementary perspective on i τsx . The first term, h sx (α) quantifies how uncertainty about the global joint state (x,y) is resolved by learning past states a 1 OR . . .a k etc. Similarly, h sx (β) quantifies the uncertainty about (x,y) resolved by learn- The final term, h sx (α \ β) is a little bit less straightforward, and reflects the structure of the double redundancy lattice and satisfies the required partial ordering imposed by Mediano et al., [8]. Makkeh et al., [10] showed that, if α � α 0 on the marginal (classic) redundancy lattice, then the set of configurations consistent with α 0 is a subset of those configurations consistent with α. This ensures that h sx (α) � h sx (α 0 ), and likewise for β. The double redundant entropy term h sx (α \ β) quantifies the probability of the intersection of the configurations consistent with α AND β. If α 0 � α and β 0 � β, then α 0 \ β 0 � α \ β and consequently, h sx (α \ β) � h sx (α 0 \ β 0 ). For a worked example, see Appendix C in S1 Appendix. Note that, while h sx (α \ β) > 0, it is not necessarily true that the associated partial entropy atoms are non-negative following the Mobius inversion.
From Eq 23 I can also construct the informative and misinformative probability mass exclusion formulation of i τsx equivalent to the informative and misinformative components of i sx , however unlike in the single-target case, i À tsx does not follow the partial ordering criteria that h sx does and is also not strictly non-negative.
2.3.2 Interpreting I τsx in the FID. Our analysis of i τsx and the decomposition into h sx (α) + h sx (β) − h sx (α \ β) has, thus far, been general, and could apply to any multivariate mutual information i(x; y). There is no temporal dynamic assumed. When decomposing i(x −τ ; x t ), the partial entropy terms can be understood as parcelling out the information contained in the instantaneous structure at time t − τ and time t, and the information in the dynamics: the transition from past the future. Each of the marginal partial entropy decompositions provides the entire, instantaneous structure at that moment. For example, when decomposing h(x −τ ) it is possible to extract all of the dependencies between the elements of x −τ (i.e the mutual informations iðx i À t ; x j À t Þ, the conditional mutual information iðx i À t ; x j À t jx k À t Þ and so on [39,41]), and likewise for x t . The sum, then of h sx (α −τ ) + h sx (β t ) quantifies the total amount of information that could be learned about the transition x −τ ! x t without making any reference to the temporally extended dynamics of the system (note that if time were reversed, the sum of the entropy terms would be the same). It is the total "static" structure.
The dynamic structure is encoded in the last term h sx (α −τ \ β t ), which is the only term that incorporates information from the state-transition structure. This can help interpret those cases when i τsx (α −τ \ β t ) < 0 bit. Negativity occurs when h sx (α −τ ) + h sx (β t ) < h sx (α −τ \ β t ). In plain language, this occurs when there is more information in the structure of the transition from x −τ ! x t then there is the instantaneous structures at time t − τ and t.

Results
In this paper, I have proposed a novel function of multi-target redundancy to be used as the foundation of an integrated information decomposition [3,8]. Based on the logic of information as exclusions of possible configurations [38], our proposed measure, I τsx , generalizes the single-target redundancy measure first proposed by Makkeh et al., to enable the full decomposition of the excess entropy intrinsic to discrete dynamical processes. To demonstrate the measure in action, in the context of the FID, I will now explore some applications: the first three will be constructed systems designed to display markedly different dynamics (disintegrated, integrated, and heterogeneous) to illustrate how different "types" of integration can be revealed by the decomposition. I will then examine spiking data from dissociated cultures made from rat brain tissue to demonstrate the insights that can be gained from both the expected, and local, integrated information decompositions.

Synthetic systems
Each of the three synthetic system is comprised of two, binary, elements that evolve through times according to different Markovian state-transition networks (visualized in Fig 3). Prior work on such simple, Boolean networks has shown that the space of even very small systems has a surprisingly rich distributions of redundant, unique, and synergistic effective information atoms [18]. Despite the simplicity of the synthetic systems under study here, they showcase how I τsx can reveal markedly different dynamic regimes. These systems were designed to show the two extreme behaviours of F WMS : the first system is totally dis-integrated and F WMS = 0 bit (as the future of the whole can be perfectly predicted from the independent parts). The second system is completely integrated: the sum of the excess entropy of the parts is 0 bit, while the whole is has non-zero excess entropy. The third system is a heterogeneous combination of integrated and dis-integrated dynamics. Considering the limiting cases of F WMS can help build intuition about how the FID framework describes diverse dynamics. I hypothesized that the disintegrated system should, generally, not have much redundant temporal mutual information, as the elements are independent of each-other, and there should be little information transfer between individual elements. Similarly, in the case of the integrated system, I expected low redundancy, and a high degree of synergy (as the future of the whole can only be partially predicted by knowing the past of the whole).
3.1.1 Disintegrated system. The first system, S D is a "disintegrated" system, in that each of the two dynamic elements is disconnected from the other: both predict their own futures with total determinism (the pattern is an oscillation 1 ! 0 ! 1 ! . . .), however there is no integration. Consequently, the excess entropy IðS D tÀ 1 ; S D t Þ ¼ 2 bit, and both individual excess entropies are each 1 bit: the "whole" is trivially reducible to the sum of its parts, since there's On the left is a disintegrated system, where E(X) = E(X 1 ) + E(X 2 ) = 2bit (i.e. the whole is equal to the sum of the parts). In the middle is a highly integrated system, where E(X) = 1 bit and both E(X i ) = 0 bit (i.e. the whole is greater than the sum of it's parts). On the right, a random system, combining a heterogenous mixture of integrated and segregated dynamics. Decomposing the excess entropy using I τsx reveals several interesting relationships (for the full decomposition, see Table 1). As expected, the strongest information atoms are the element-wise information "storage" atoms: {1} ! {1} and {2} ! {2}. This is consistent with our intuition that the future of each element is best predicted by its own immediate past. The fact that the unique information storage atoms are also the largest single atoms is consistent with the idea that the most informative information dynamic about the whole system is the behaviour of the individual nodes considered individually. The informative interaction between the unique information and the synergistic information are also consistent with our intuitions about the disintegrated systems. For instance, the informative value of the "upward-causation" atom {1} ! {12} reflects the fact that knowing the state of X 1 À t also informs on the state of the whole X t in the future: knowing X 1 À t ¼ 0 rules out any configuration of X t where X t = 0 (since the values of each X i oscillate). Likewise for the "downward-causation" atoms such as {12} ! {1}: knowing the state of the whole at time t − τ constrains the individual parts at time t (albeit, to a lesser extent than they constrain themselves).
The non-zero value of the double redundancy atom {1}{2} ! {1}{2} is unexpected, although not inexplicable. Suppose that, at time t − τ, X 1 À t ¼ 0 OR X 2 À t ¼ 0. Since both X i oscillate 0 ! 1 ! 0 ! 1. . ., learning the state of either past variable is enough to rule out one possible joint future: X t = (0, 0). Consequently, the union probability of all possible futures consistent with X 1 t 6 ¼ 0 OR X 2 t 6 ¼ 0 increases. The final set of atoms worth exploring are the negative value for the information transfer terms (such as {1} ! {2}). The temporal mutual information IðX 1 À t ; X 2 t Þ ¼ 0 bit, but the partial information atom is less than zero. Why? The answer is that computing the information transfer atom requires subtracting the sum of all the atoms that precede it on the lattice from the temporal mutual information. In this case, this is only the non-zero, double redundancy atom

PLOS ONE
{1}{2} ! {1}{2}, which when subtracted off, produces a negative value (since the temporal mutual information is 0 bit). 3.1.2 Integrated system. The second system, S G is an "integrated" system, in that there the whole system has 1 bit of excess entropy, but both elements have individual excesses entropies of 0 bit. This is accomplished using a parity check function: at every time step, the parity of the system is preserved, but the individual assignments are done randomly. For example, if S G À t ¼ ð1; 0Þ, then S G t could equal (0, 1) or (1, 0) with equal probability (but never (0, 0) or (1, 1)). For a visualization of the state-transition matrix, see Fig 3, center.
The first interesting finding is that, as with S D , the double-redundancy atom is unexpectedly positive. This occurs because, as before, learning X 1 À t OR X 2 À t is sufficient to inform on the future. Suppose that X 1 À t ¼ 0 OR X 2 À t ¼ 0. There are two configurations with an odd-parity consistent with those conditions ((0,1) and (1,0)), and only one configuration with even parity ((0,0)). This means that the future state will also be more likely to have an odd parity than an even one.
Due 152. The result can be admittedly difficult to interpret, although negative atoms in classic, single-target PIDs are also relatively widespread (see [10,33,34]). One possible interpretation, explored more in Section 3.1.4 is that these values reflect how the context provided by the dynamics of the whole system can influence our interpretation of dynamics of the individual parts. For example, when considered alone, there is no predictive information about the future of X 1 in it's own past, however, X 1 's evolution through time is not an autonomous process, but occurs in the context of X 2 's dynamics. Consequently, the kinds of inferences one can make about X 1 when considering the other elements of the system may be different from the kinds of inferences that could be made if X 1 were considered alone.
The higher-order, synergistic atoms show the same effect: since the redundancy-to-synergy atoms ({1}{2} ! {12} and {12} ! {1}{2}) are both negative, the upward and downward-causation atoms are positive, despite the fact that the relevant mutual informations are all zero bit.
3.1.3 Heterogeneous system. The final system was one with heterogeneous transitions, with probabilities drawn from a Gaussian distribution N ð0; 1Þ (for details, see Varley & Hoel [18]). In contrast to the prior two systems, this system, S H does not have an a priori fixed "type" of dynamic and was expected to display multiple types of information conversion. From the outset, I anticipated evidence of synergistic dynamics, as the excess entropy of the whole system was 0.422 bit, while each of the two elements had individual temporal mutual informations of 0.017 bit and 0.001 bit respectively, indicating a dynamic where the whole is much more predictive than the sum of its parts. For a visualization of the state-transition matrix, see Similarly, the conversion from redundant to synergistic information and vice versa both has opposite signs, suggesting that this system simultaneously displays informative "downward causation", but misinformative "upward causation". In totality, there were more informative integrated information atoms than misinformative ones (a ratio of 11 to 5), showing that, despite the overall strongly synergistic nature of the system, unique information transfer and redundant information dynamics all co-existed together. This is consistent with previous work that found that these kinds of modified-Gaussian systems can display a wide range of information dynamics, at multiple scales [18].
3.1.4 Interpreting negative local FI atoms. The three systems described above provide concrete toy models which can be used to build intuition about the phenomena of negative FI values. Consider the heterogeneous system S H : specifically the information that S 2 tÀ 1 communicates to itself at time t (IðS 2 tÀ 1 ; S 2 t Þ). This value, sometimes called active information storage [20], is an expected mutual information and must be non-negative: in this case, it is 0.001 bit. If an observer were just observing the dynamics of S 2 (and ignoring S 1 entirely), their uncertainty about the future of S 2 would be reduced by observing its past. Despite this, the "stored" partial information {2}{2} is negative. How can this occur?
Our interpretation is that this mismatch is explained by the fact that the evolution of S 2 occurs in the context of all other elements in the system. Analysing S 2 on its own can be misleading, because its dynamics are informed by the states of the other elements. In this case, the lions share of the information is information that initially present in both atoms, and then erased from S 1 : {1}{2} ! {2}. So, a significant amount of the information that one observes in IðS 2 tÀ 1 ; S 2 t Þ is not specific to S 2 (at least not at first), but rather, emerges from the interaction with S 1 .
In the case of these toy systems, there is no "mechanism" to be explored, however, this kind of distinction may be of value when analysing real-world systems. For example, a scientist studying the activity of neurons may observe a non-zero active information storage and propose to connect that to biophysical processes such as refractory periods [42], however, the FID analysis shows that this information may not actually reflect the specific dynamics of a single neuron, but rather, some contextual interaction between two, and the actual memory capacity of the neuron is altogether different (in this case, misinformative, rather than informative). This example is, at present, admittedly speculative, considerable work remains to understand how the outputs of the FID algorithm map onto complex, real-world dynamical process.

Analysis of dissociated neural culture data
To demonstrate how decomposition of the excess entropy using I τsx might be applied to empirical data, I analysed 31 dissociated cultures of rat hippocampal cortex. These preparations were made by resecting slices of embryonic rat cortex, and then culturing them to produce networks of living neuronal tissue [11]. After preparation of the cultures and a period of maturation, spontaneous spiking activity was then recorded on a 60 electrode array and spikesorted to produce a time series of spikes for each putative neuron (for details, see the original manuscript presenting these data [12] and the Materials & methods Section).
Dissociated and organotypic cultures have been a highly productive model system for research into information dynamics and "computation" in biological systems: for example, see studies of the relationship between criticality and information-theoretic complexity [12,43], network structure and synergy [44][45][46], changes to computational structure during maturation and development [47][48][49], and the topology of effective networks [50][51][52]. In many respects, they are a natural fit for these kinds of information theoretic analyses: neuronal activity is naturally discrete (in the form of action potentials which can be represented with binary states), the neuron is a well-defined "unit" (a single cell), and the communication channels between units is well-understood at the mechanistic level (neurons communicate over synapses via the release of neurotransmitters), as are the general causal effects of interaction (neurons can be inhibitory or excitatory, a relationship easily expressible in terms of Bayesian prior and posterior probabilities [53]).
In this study, I demonstrate the utility of I τsx as both an expected and localizable measure of information-sharing by examining the pairwise relationships between neurons. Our particular focus is on avalanches of high-firing activity, which are typical of neural systems and systems poised near a critical phase transition in general. While the question of criticality in the brain is a complex question (for review, see [54,55], and for a dissenting view, see [56]), it is an empirical fact that spontaneous activity in cortical networks displays avalanche dynamics of widely varying lengths (typically modeled as following a power-law, or other heavy-tailed distribution [57]). While the existence of such avalanches is extremely well-documented, and their genesis the subject of intensive modeling work, it is still unclear what, if any, role they play in cortical computations. Varley et al., hypothesized that they may play an integrative role after finding that loss of consciousness via the anaesthetic propfol caused pronounced collapse of large-scale avalanche structure [58], however, such hypotheses remain highly speculative in the absence of a formal framework for understanding localizable computation. I propose that the FID framework, coupled with the intrinsically local nature of I τsx solves that problem.

Distributions of average FID atoms.
For each of the 31 cultures, I calculated the lag-1 excess entropy for every pair of nodes in the network (restricting our analysis to consecutive bins within avalanches, as in [49]). If the expected excess entropy was significant at α = 10 −6 , (Bonferroni corrected), I went on to do the full integrated information decomposition. The result is, for every culture, across all pairs of nodes with significant excess entropy, I can compute sixteen distinct pairwise "integrated information matrices" (for visualization see Fig  4). For these expected values, I normalized each one by dividing it by its associated excess entropy to control for the variability in the overall amount of temporal information.
To explore the overall distribution of normalized information atoms, I aggregated over all cultures to create histograms of the various FID components (Fig 5). I found that the element-level information storage atoms ({x} ! {x}) had the overall highest average normalized value (0.417 ± 0.422), followed by the element-level information transfer atoms ({x} ! {y}, 0.097 ± 0.195). These results are consistent with our initial expectations: individual neurons are known to have a strong individual temporal dependence [42,59], likely reflecting the refractory period following an action potential). Similarly, the high element-wise information transfer is consistent with the basic mode of communication between neurons being pairwise synaptic signaling. The other modes of information conversion, however, remain more mysterious: for example, the information copy and information erasure atoms ({x} ! {1}{2} and {1} {2} ! {x} respectively) both had values of 0.011 ± 0.0325, which is lower than the transfer atoms, but by less than an order of magnitude. Exactly what kind of biological process these modes correspond to is a promising area of future study. While every atom had particular pairs of neurons for which it was negative, at the aggregate level, every atom was, on average, greater than zero, including the higher-order measures, such as the double synergy ({12} ! {12}). These results show that spontaneous, on-going avalanche dynamics have a significant, element of consistently synergistic activity. For a complete set of correlations between all the atoms, see S1 Fig. I can also see that the information transfer atoms overall generally have the highest absolute values.
To compare the results of the integrated information decomposition to a more established measure of systemic complexity, I compared the distribution of normalized FID atoms to a measure of integrated information first proposed by Balduzzi & Tononi based on the difference between the total excess entropy and the sum of the two marginal excess entropies [60]: Typically referred to as F WMS (WMS indicating "whole-minus-sum"), it is a useful measure of non-trivial systemic integration (see [37] for a recent exploration of F WMS in a FID context). F WMS has obvious parallels with the simple toy example of two predictors and a single target introduced in Section 1.1.1, with similar interpretations of the resulting sign (i.e. if F WMS > 0, then the system has synergistic dynamics only accessible when considering the For a single culture (in this case with approximately one hundred individual neurons), one can construct sixteen different pairwise matrices, each one corresponding to a FI atom. This contrasts with more well-known measures of functional and effective connectivity, which produce one matrix per system, reflecting a single "kind" of statistical relationship (be it functional connectivity, effective connectivity, etc). Integrated information decomposition, on the other hand, provides multiple "kinds" of relationship at once, allowing a far more complete picture of computational dynamics. Here, the value of each atom is normalized by the total excess entropy.
https://doi.org/10.1371/journal.pone.0282950.g004 whole as opposed to the independent parts). As with the histograms, I aggregated over all significant pairs of neurons in all the cultures, and correlated each ones F WMS against each of the normalized FID atoms. For visualization, see Fig 6. Spearman correlation found that there was a very strong, negative correlation between F WMS and the normalized information storage atoms ({x} ! {x}, ρ = −0.8, p < 10 −6 , Bonferroni corrected). This is unsurprising, as information storage contributes to the marginal, within-element predictive information and contributes nothing to the higher-order interactions that comprise "integrated" information (consider the "disintegrated" toy model described above in Section 3.1). All other normalized FI atoms were positively correlated with F WMS . The highest correlation was with the element-wise information transfer atoms ({x} ! {y}, ρ = 0.57, p < 10 −6 , Bonferroni corrected). Since inter-element information transfer is a core element of systemic "integration", and considering the overall high prevalence of bivariate transfer in the data (see Fig 5), this result is unsurprising. As expected, the FI atoms containing higher-order synergies were all positively correlated with F WMS , with the double-synergy term having one of the highest overall correlations (ρ = 0.41, p < 10 −6 , Bonferroni corrected). This is consistent with the interpretation that F WMS is an overall measure of total total systemic integration.

Local FID analysis.
In addition to the average values of the integrated information atoms, the I τsx measure is localizable, allowing us to do a full, sixteen-atom decomposition for every moment in time, for every pair of neurons with significant excess entropy. I can leverage this property to perform a detailed analysis of the avalanches as temporally-extended objects qua themselves (rather than treating them as single units sampled from some heavy-tailed distribution). Across all pairs of neurons in all 31 cultures, I aggregated all avalanches of length k > 4, and if I observed at least 50 instances of avalanches of length k, I averaged them to create an "average profile." Prior work with dissociated culture data has shown that avalanche profiles tend to be scaled versions of one another [12] (and references therein), showing a characteristic growth and then collapse of activity over the duration (for a visualization of the average avalanche profiles, see Fig 7,Upper Left). For every moment in the avalanches, I computed the local excess entropy, and then performed the FID using the local i τsx to explore how the computational dynamics vary over the course of the avalanche. For a visualization of the profiles of the avalanches, the excess entropy, and all FI atoms, see Fig 7. Local FI atoms were not normalized, as the local excess entropy is a signed value, complicating the interpretation of a normalized value.
Upon visual inspection, it is clear that the various FI atoms have distinct profiles: for example, the profiles of the element-wise information storage and transfer atoms are characteristically similar to the excess entropy profiles, with rapid increases to a peak followed by a heavy tail. In contrast the double-synergy profile has a noisier shape, appearing to drop towards misinformation at the end of the avalanche. To explore these profile differences in more detail, I directly compared the spiking activity profiles to their associated informational profiles. I began by computing the cumulative profile for each avalanche: in the cumulative avalanche, every moment is given as the sum of all previous moments, including the current one (analogous to a cumulative probability distribution). I then scaled each distribution by dividing it by the final, cumulative value, forcing all cumulative avalanches to terminate at 1. Finally, I filtered outlying cumulative avalanches that had unusually extreme deviations under the assumptions that they were contaminated by noise. By plotting the cumulative information atom avalanche distributions against the cumulative spiking avalanche distributions, it is possible to assess how the growth and collapse of information atoms differs from the change in spiking dynamics (see Fig 8). If the information atoms track the spiking activity perfectly, then the resulting curves will fall on the y = x line. Deviations from the line of symmetry indicate a  [60]. While most are generally positively correlated, the element-level information-storage atom is a dramatic outlier, with a highly significant negative correlation of -0.8. I believe this occurs because a high degree of information storage in single elements means that the future of the whole is mostly predictable from the individual parts. The more individual elements disclose about their own future, the less "integrated" information in the system.

PLOS ONE
faster or slower accumulation of information than would be expected if it was perfectly correlated with spiking activity.
Visual inspection of the excess entropy cumulative profile reveals that avalanches are broadly-speaking informationally "front-heavy", the local excess entropy climbs much faster than spikes accumulate (as seen by the curve climbing above the y = x line), and has almost entirely "saturated" before halfway through the avalanche. When considering avalanches of differing lengths, this front-heaviness appears to become more pronounced for larger avalanches (for small avalanches of length between 4 and 10, the normalized cumulative distribution curves hug the line of symmetry much more closely). This suggests that, while all spiking avalanche profiles may be roughly scaled versions of each-other, that scaling is not universal when it comes to information content: larger avalanches have different information profiles than smaller ones.
The pattern displayed by the cumulative excess entropy profile is broadly mirrored by the individual FI atoms, although is the considerable variation between them. For example, the synergy-to-redundancy atom {12} ! {1}{2} (and it's mirror {1}{2} ! {12}) both hug the line of symmetry much more closely. In contrast, the the cumulative double redundancy profiles and the cumulative information storage profiles track the cumulative excess entropy much more closely. Interestingly, the cumulative information copy and erasure profiles ({x} ! {1}{2} and {1}{2} ! {x}) both achieve a maximum value before the end of the avalanche and then drop down, indicating a transition from informative to misinformative dynamics towards the end of the activity period. The cumulative double-synergy profile shows one of the most intriguing patterns: for large avalanches, it appears to have an S-shaped profile, initially climbing rapidly during the avalanche, before dropping across the line of symmetry. The significance of such a dynamic is unclear, and this is a finding well worth revisiting and replicating in a future data set.
Another interesting type of variability between atoms is how the profile changes with avalanche duration. In the case of cumulative excess entropy, cumulative double-redundancy, and cumulaive information storage, small avalanches reliably hug the line of symmetry and it is the larger avalanches that display interesting deviations. However, this is not the only pattern: for example the "downward causation" atom ({12} ! {x}) and the information erasure atoms both appear to display a kind of biphasic pattern: smaller avalanches (indicated by violet in Fig  8) run reliably below the line of symmetry, while large avalanches (indicated in orange) run above it.
From these results, it is clear that the FID framework, coupled with a localizable measure such as I τsx can provide a rich, novel approach to understanding ongoing neural activity and reveal patterns never before observed. For the purposes of this paper, I restricted myself largely to qualitative analysis of local integrated information dynamics: the results presented here will require ample replication and much deeper study to determine their significance.

Discussion
In this work, I have presented a novel information-theoretic measure, I τsx , a generalization of the classic Shannon mutual information, that quantifies the redundant information shared between multiple sources and multiple targets. I τsx is motivated by the recently proposed Integrated Information Decomposition [3,8], which generalizes the classic single-target Partial Information Decomposition [6,7] to sets of multiple interacted sources and targets. Like all information decompositions, the FID is peculiar in that, while it reveals the structure of multivariate information, it lacks a crucial piece required to calculate numerical values from data. This is solved by providing I τsx as a redundancy function, with which the double redundancy lattice can be solved.
Here, the FID framework is used to decompose the excess entropy [1], which quantifies the total amount of statistical dependencies that constrains a systems evolution from past to future. Prior work [18] on using PID to decompose the excess entropy could reveal how the past states of individual components (and ensembles of components) constrain the future of the whole system, but provided no finer detail. Using the FID, it is possible to understand how elements constrain their own futures, the future of other elements, groups of elements or the whole system in much finer resolution. To demonstrate the utility of the I τsx measure, I first examined three small, completely specified toy models (each with its own enforced type of dynamic: integrated, disintegrated, or a mixture of the two) before moving on the empirical data recorded from dissociated cultures of rat cortex. I showed that both the average and local versions of I τsx revealed rich information-dynamic structures in the data, including how different kinds of "neural computation" rise and fall as part of the bursty dynamics intrinsic to the nervous system. A significant benefit of the FID framework is that is allows us to generalize different "kinds" of integration in a complex system such as the brain. Historically, information-theoretic approaches to integration have focused on single measures, such as integrated information theory's eponymous measure [60]. The information decomposition framework, however reveals a multitude of different ways that groups of neurons compute their next state. Recent, promising work using fMRI data has started to relate various FI atoms (particular the synergistic atoms) to macro-scale brain dynamics [24,26], as well as different subcritical, critical, and supercritical dynamical regimes of various dynamical systems [37]. Given the wealth of data produced by modern neural recording methods, I am optimistic that there is a very wide world of possible applications of this framework.
While I have focused on the FID framework as a means of decomposing the excess entropy of ongoing, spontaneous neural dynamics in dissociated cultures, in principle the framework could apply to any data set with multiple, interacting predictor and predicted variables: the temporal dimension is not required. This opens up a wider range of applications of data analyses than is accessible to the classic PID-for example, Varley & Kaminski recently used the PID to asses how varying social identities (such as race and sex) jointly disclose information on single outcomes (such as income or health status) [61], however outcomes themselves are not independent and may contain interesting higher-order correlations within themselves. For example, how do the identities race and sex disclose information about income and health outcomes collectively? Generalizing to a FID framework may reveal many meaningful dependencies within social data, as well as many other fields where complex systems are studied.

Limitations
As currently formulated, the I τsx function is only well-defined for discrete random variables, a feature that it inherits from the original I sx measure [10]. Continuous generalization of I sx remains an area of active research [36] and it is assumed that a successful algorithm for I sx will also work for I τsx . As it stands, the restriction to discrete random variables limits applicability. Prior work applying PID and FID to naturally continuous data such as fMRI or cardiac rhythms has been done using measures of redundancy that are well-defined for Gaussian distributions [24,26,37], although these measures have their own limitations, such as lacking the intuitive interpretation, being non-localizable, or requiring arbitrary thresholds or optimizations.
Even in the event that a successful generalization of I τsx is achieved, the PID and FID frameworks struggle to scale gracefully for all but the smallest systems. In the case of the PID, the number of atoms in the lattice of a system of size k grows with the sequence of Dedekind numbers [7]: for a system with k elements, the associated lattice has D(k) − 2 atoms. Given how fast the Dedekind sequence grows, a complete decomposition of almost any interesting natural system (which can have thousands, or millions of components) is impossible. The FID framework fares even worse, since there will be one temporal atom for every pair of partial information atoms in the associated PID lattice. The size of the FID lattice then grows with the mind-boggling square of the Dedekind numbers: (D(k) − 2) 2 (a five element system will have a FID lattice with 57,471,561 elements). Approximate heuristics such as the F WMS measure, or more recently, the O-information [62][63][64] have been proposed as efficient, if imprecise, tools for recognizing the presence of higher-order dependencies in dynamical data, however, there is still room for refinement. Another possibility might be to explore temporal PID (rather than FID) using a redundancy function equipped with a target chain rule [13]. The target chain rule generalizes the chain rule of mutual information to the PID: a redundancy function i \ follows the rule if i \ (α; y 1 , y 2 ) = i \ (α; y 1 ) + i \ (α; y 2 |y 1 ). Conveniently, the i sx measure satisfies the target chain rule, so a future avenue of research is to compare the results of i τsx with the chained i sx .
The final limitation is that the the structure of the FID lattice, which allows for single sources to appear multiple times (e.g. {12} ! {x} and {12} ! {12} both incorporating the {12} source) complicates the overall behavior of the redundancy functions. For example, the original I sx function has certain, provable properties (such as the global non-negativity of it's informative and misinformative components) that I τsx cannot adopt, since the structure of the lattice is different. This strong suggests that a return to the mathematical foundations of integrated information decomposition may be in order and new desiderata agreed on that may diverge from the single-target case.

Conclusions
In this work, I provide a redundancy function, I τsx that can be used to decompose the total information that flows from the past to the future through the "channel" of a multi-element, dynamic system. This framework, when applied to neural data reveals a rich repertoire of complex computational dynamics that can be temporally localized to the scale of individual moments in time. Based on the fundamental logic of information as exclusions of probability mass, I τsx generalizes the classic Shannon entropy and I anticipate that the work presented here will open new doors both in the specific fields of neuroscience as well as in complex systems science more generally.

Dissociated culture preparation & recording
The details of the general process for the preparation of dissociated cultures can be found in [11]. Here I summarize the specific methodologies detailed in [12], who first introduced this dataset. Pregnant Sprague-Dawley rats (Harlan Laboratories) on Day 18 of gestation were euthanized via CO 2 and the embryos removed. Embryonic hippocampal tissue was ressected and dissociated en mass before being plated on a Multichannel Systems 60 electrode arrays (8 × 8, 200 μm electrode spacing, 30 μm electrode diameter). Spontaneous activity was recording at 20,000 Hz for approximately 1 hour (for this analysis, all recordings longer than 60 minutes were terminated at that point). The resulting spikes were sorted with the wave_Clus algorithm [65] to infer individual neurons. Following spike sorting, the data were rebinned to 3ms bins (approximating the average inter-spike interval for the set of all 31 the recordings).

Mutual information calculation & significance testing
For every pair of neurons in a given culture, I calculated the mutual information between those two nodes at time t and the same two nodes at time t + 1: Where H(�) is the classical Shannon entropy function. I significance tested each pair against the analytic null distribution for discrete random variables with finite alphabets [66,67], with an α = 10 −6 , followed by Bonferroni correction. The analytic null estimator allows for very efficient estimation of p-values, requiring minimal compute time (and reducing the associated carbon costs associated with time-intensive high-performance computing). I used the implementation provided by JIDT [68], accessed via the IDTxl package [69] for its efficient Python interface.

Constructing toy Boolean networks
For the integrated and disintegrated example systems, the transition probabilities were worked out by hand from first principles. The heterogenous system was constructed based on the details provided in [18]. Briefly, a 4 × 4 transition probability matrix was initialized, and every entry M ij was drawn from a normal distribution with unit mean and variance. The absolute value was taken, and the out-going probabilities normalized to define a discrete probability distribution.

Excluding noisy cumulative avalanche profiles
To remove information avalanche profiles excessively contaminated by noise, I excluded any cumulative avalanche profiles that had an excursion of more than 1 bit away from the y = x line or a total length greater than 2 bit. With these thresholds, I excluded on average 7.5 ± 8.18 avalanches for each FI atom. To see the full set of unfiltered cumulative avalanche plots, see S2